forked from alphagov/notifications-utils
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upstream alphagov updates #11
Open
frankmolenaar1986
wants to merge
289
commits into
Worth-NL:main
Choose a base branch
from
alphagov:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add a command to copy `pyproject.toml` into apps
This is better than looking at `requirements.in` because: - `requirements.in` isn’t always present (for example when running apps in Docker) - `requirements.txt` is more likely to reflect what’s actually installed (since that’s what we tell `pip` to look at)
Look at `requirements.txt` to determine installed version
NotifyTask: include pid and other structured fields in completion logs
For context, when the admin app wants to get data from our API it will first try and get the data from redis and if the data doesn't exist in redis, it will get the data from the API and then finish by setting that data in redis. When the admin app wants to update data for our API, it currently calls out to the API to update the data and then deletes any existing/relevant data cached in redis. The subsequent time it tries to get this data that it just updated, it will use the usual approach for getting data that will set the redis key too. There is a problem with when the admin app wants to update something in the database. At the moment, it will start by calling the API to update the database and if that is successful it will then attempt to delete the relevant cache keys in redis. But redis may not always be available and in that case, it will - fail to delete the cache key - catch and ignore the exception raised when trying to delete the redis cache key This leads to the change the user requested being made in the database, but the cache still has the old data in it! This is bad because our apps check the cache first and this can result in us sending out incorrect emails using old templates, for example. We have a runbook to manually recover from this position if redis has downtime and delete queries fail during this time: https://github.com/alphagov/notifications-manuals/wiki/Support-Runbook#deal-with-redis-outages Note, there is no issue with `cache.set` calls. If the call to redis fails as part of this, no stale data is produced. The database hasn't been changed and either redis hasn't been changed or hasn't had the current data added to it. This commit changes our caching logic so that we can't end up with stale data in redis if a delete to redis fails. If a delete to redis fails, then we error early and don't attempt to update the database. The trade off we make here is that now a user will see an error page if their request failed to delete from redis, whereas before they would have gotten a 200 (but ended up with stale data). We think it is worse to have stale data, then it is to fail a users request.
In the previous commit, we tried to avoid stale data in redis if redis is unavailable. However, this introduced a race condition where another request re-populates the redis key with the old value before the database has been updated. We fix this by also deleting the cache key after the API update so that if that happened, we would remove the stale data after the API update. There is a small edge case that could still end up with stale data that this commit doesn't solve (and we may just have to tolerate). If we hit the race condition above so stale data is reinserted in redis before the database update, then the database update happens and redis goes down at this point. Then what will happen is the second cache clear will fail and we will be left with stale data. This should be a rare case as involves both a race condition happening so two requests for the same object at very similar times, and redis going down mid request. Again, we lean towards throwing an uncaught exception if a redis delete fails. It means there is a chance of stale data and the user should hopefully check if their action was successful and retry if they see stale data in the user interface (which should then hopefully fix the problem).
Ensure no stale data in redis if redis errors when deleting cache key
first in a series of commits to move phone number and email validation in to one centralised location too
also have a new shared base class so we don't have phone number errors extending from email address errors. This was raised six (!) years ago but never sorted out[^1]. [^1]: https://github.com/alphagov/notifications-api/pull/1590/files#diff-463068b4d0340d64d95bb9d1e816b1d4da643fd30341c637f2745fff3609559bR46-R47
note that i moved the email-specific regex constants out of notifications_utils/__init__.py and into the email_address file itself. these are only used by the email_address code itself and in one place in document download, and now that they're in a smaller file they seem reasonable to place here
mostly so that in places that use recipient.py, we'll be forced to update the imports so that they're using the new paths to keep everywhere up to date
Refactor phone/email/letter validation
The postcode zone is the first 1, 2 or 3 letters of the postcode. There are about 120 postcode zones defined in the UK. At the moment we allow postcodes to start with any 1 or 2 letters. This means we allow plenty of postcode which aren’t real, for example `NF1 1AA`. By restricting the first n letters of the postcode to only valid postcode zones we can reduce the number of letters which DVLA manually flag to us as having invalid addresses. *** List taken from: http://www.ons.gov.uk/ons/guide-method/geography/products/postcode-directories/-nspp-/onspd-user-guide-and-version-notes.zip
This saves us clogging up `postal_address.py` with a long list
`GX` is not a valid UK postcode zone so we will reject it anyway, without having to treat it as a special case.
Restrict postcodes to valid UK postcode zones
Many libraries we're only using much more recent versions than utils has pinned anyway (like flask or pypdf), so it's easy to bring them up to the oldest version in use without any risk. Others have outstanding CVEs (like jinja) so it's good to make sure we always have secure versions. Others like boto3 we've already assessed the changelogs elsewhere[^1] Many major changes were just dropping support for old versions of python Some like cachetools needed a bit of checking as they changed import structure (but we're not affected in this instance) [^1]: alphagov/notifications-api#4082
Bump subdeps
Pyupgrade[1] automatically upgrades syntax for newer versions of Python. Newer syntax is often friendlier and less verbose – for example f-string formatting rather than calls to `str.format`. It has a total of 42 rules[2]. This commit adds it to our linter config, and then fixes all the instances of old syntax it found (the vaast majority of these were automatic by running Ruff with the `--fix` flag). 1. https://pypi.org/project/pyupgrade/ 2. https://docs.astral.sh/ruff/rules/#pyupgrade-up
bump test dependency versions
You can bypass newline truncation[1] by sending the dreaded unicode ‘hangul filler’ character in my personalisations. This could be utilised by a dodgy character who wants to hide some portion of the email content (if, say, they've utilised template injection to create a phishing email) *** 1. Defined at https://github.com/alphagov/notifications-utils/blob/88c57d5eb530b88794814e57249c2403227aa664/notifications_utils/markdown.py#L222
Remove hangul filler character from emails
ruff doesn’t read `setup.cfg` so the rules defined there are redundant. This is a hangover from when we used to use `flake8` directly. I’ve copied some of the rules into pyproject.toml Selected rules: - B902 and implemented in ruff as N804 and N805. Only copying N804 because N805 gets tripped up by sqlalchemy/sqlalchemy#9213 - E203 still in preview in ruff, add a note to include it later Ignored rules: - W503 and W504 aren’t likely to be implemented in ruff astral-sh/ruff#4125 - E203 is still in preview https://docs.astral.sh/ruff/rules/whitespace-before-punctuation/ so we don’t need to worry about ignoring it (and maybe it will be fit for purpose once it’s out of preview) Cyclomatic complexity: - this repo has it set to 8, but everywhere else we use the default (10) so let’s not copy that into `pyproject.toml`
Remove flake8 config from `setup.cfg`
…tly named and called
…refixes first. When casting the list of keys to a(n unordered) the country code becomes set non-deterministically. This change instead turns the set into an ordered list, ordered with the longest prefixes first to avoid this issue
…validation Revert "Merge pull request #1158 from alphagov/remove-old-phonenumber validation"
Bump core dependencies to latest versions
We can’t upgrade all apps until psf/requests#6730 is fixed (it stops the API from talking to DVLA)
Downgrade minimum version of requests to 2.32.3
anything else is asking for trouble
returning a CacheResultWrapper from a RequestCache.set-wrapped function allows the function to contextually decide whether the result should be cached internally this uses functools.singledispatch to detect CacheResultWrapper instances, so should allow further customization if the design of CacheResultWrapper isn't convenient for some situation
…pper `RequestCache`: add `CacheResultWrapper` to allow dynamic cache decisions
This fixes a couple of moderate security vulnerabilities.
Bump minimum jinja2 version to latest
these functions can be used to collect metrics on the greenlet switching that eventlet is performing behind the scenes and debug situations where we're seeing suboptimal behaviour not recommended to be enabled permanently as there is probably a minor performance penalty to accounting these times
this allows logging.formatting to be imported without implicitly importing all of these mostly-flask-related logging routines this is important because we want to be able to use logging.formatting from the gunicorn config which executes before eventlet's monkeypatching has been performed, and implicitly importing flask before monkeypatching adds to the spew of errors emitted on app startup.
leaving scope for the thread_time function returning None as the greenlet variant will if the appropriate trace function isn't installed
eventlet: add greenlet time accounting and metric logging tools
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
upstream alphagov updates